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The  views  and  conclusions  contained  in  this  document  are  those  of  the  author(s)  and  should  not 
be  interpreted  as  necessarily  representing  the  official  policies,  either  expressed  or  implied  of  the 
Defense  Advanced  Research  Projects  Agency  or  the  U.S.  Government. 


DESCRIPTION  OF  PROGRESS- 


Investigations  of  several  subproblems  in  the  area  of  derivation  of  parallel  programs  were  continued 
during  the  current  quarter.  TTiese  investigations  include: 


(1)  Michael  Landis  (graduate  student),  John  Reif  (PD>  and  Robert  Wagner  (Duke  faculty): 
Intermediate  Representation  for  Parallel  Implementation 

Our  research  efforts  are  just  beginning  to  focus  on  the  possibility  of  extending  a  high-level  data- 
parallel  language  with  constmcts  for  process  parallelism.  Our  goal  is  to  begin  with  a  data-parallel 
language  like  NESL,  which  is  under  development  by  Guy  Blelloch  at  Carnegie  Mellon  University. 
This  language  provides  nested  data-paraUelism.  We  believe  that  by  extending  it  with  process 
parallel  primitives,  the  language  will  have  wider  applicability,  but  yet  will  still  be  able  to  be 
implemented  efficiently. 

This  woric  has  directly  out  of  work  ovct  the  past  year  in  trying  to  develop  extensions  to  a  low-level 
data-parallel  intermediate  representation  to  accommodate  asynchronous  processes.  After  extensive 
research  we  have  decided  that  parallel  process  extensions  to  a  low-level  intermediate  representation 
are  not  practical  because  of  fundamental  differences  in  primitives  provided  by  different  hardware 
vendors.  Instead,  we  are  focusing  our  efforts  currently  on  the  extension  of  a  run-time  library  for 
implementing  data-parallel  languages.  This  library  will  provide  the  support  for  high-level  language 
development  while  maintaining  portability  and  efficiency  through  the  use  of  the  C  language. 

As  an  example,  one  possibility  which  we  are  investigating  is  the  integration  of  the  POSDC  thread 
pxkage  with  CVL,  die  C  Vector  Library  under  development  at  CMU.  In  order  to  gain  experience 
with  parallel  systems  and  with  the  implementation  of  <JVL,  Mike  Landis  is  collaborating  with 
Blelloch's  team  in  order  to  develop  a  multiprocessor  CRAY  implementation  of  CVL.  Tins  work 
should  be  done  around  the  end  of  October,  at  which  time  we  w^  focus  on  the  extensions  to  this 
library. 

(2)  Michael  Landis  (graduate  student),  John  Reif  (PI),  and  Robert  Wagner  (Duke  faculty): 
Data  Movement  on  Processor  Arrays 

We  are  still  developing  ways  of  evaluating  uniform  expressions  in  near  minimum  parallel  time  on 
processor  arrays.  A  paper  describing  the  solution  on  two-dimensional  arrays  as  bwn  submitted  for 
journal  publication;  an  abstract  of  this  paper  follows. 

"Evaluating  Uniform  Expressions  Within  Two  Steps 
of  Minimum  Parallel  Hme", 

Robert  A.  Wagner 

ABSTRACT 

Consider  an  array  of  Processing  Elements  [PEs],  connected  by  a  2-dimensional  grid  network,  and 
holding  at  most  one  operand  of  an  expression  in  each  PE.  Suppose  that  each  PE  is  allowed,  in  any 
one  pa^el  step,  to  receive  one  item  of  data  from  any  of  its  4  immediate  neighbcn^,  and  to  transmit 
one  datum,  as  well.  How  can  an  associative  operator,  such  as  addition,  combine  ^  the  operands, 
using  as  little  time  for  communicatitHi  as  possible?  An  expression  using  such  a  single  operator  is 
termed  a  uniform  expression.  When  the  total  number  of  communication  links  used  is  tl^  measure 
of  goodness,  this  problem  becomes  a  Steiner  Tree  problem,  in  tiie  Manhattan  Distance  metric. 
When  the  measure  is  minimizing  the  parallel  time  to  ccanpletion,  a  method  for  solving  this  (noblem 


is  given  which  is  optitnal  to  within  an  additive  constant  of  2  time-steps.  The  method  has 
applications  when  the  operands  are  matrices,  spread  over  an  array  of  PEUs,  as  well.  Some  lower 
bounds  for  this  problem,  in  more  general  networks,  are  also  proven. 

CurremaiQik 

Ow  current  work  in  this  area  is  to  extend  this  parallel  reduction  operation  to  higher  dimensional 

grids.  This  woric  is  nearing  its  successful  completion.  Robert  Wagner  and  M3:e  Landis 

have  developed  a  method  for  performing  reductions  on  multidimensional 

processor  arrays  that  is  within  a  few  steps  of  a  provably  minimal  time.  They 

are  currently  finishing  their  paper,  which  is  a  follow-up  to  Robert  Wagner’s 

paper,  "Evaluating  Uniform  Expressions  Within  Two  Steps  of  Minimum  Parallel 

Time."  This  paper  solved  the  problem  for  two-dimensional  arrays  only. 

Future  work 

In  looking  at  the  problems  of  distributing  collection-cHiented  operations  and  collections  across 
many  MIMD  processors,  the  question  of  data-communicadon  cost  comes  up. 

Suppose  P^s  arc  connected  in  a  2-D  grid,  with  the  property  that  any  PE  can  receive  a  datum  from 
any  one  neighbor  at  a  given  time-step.  (Other  network  models,  and  communication  behavior 
schemes  arc  also  possible.)  In  this  setting,  ignoring  operation  costs,  we've  studied  the  problem  of 
computing  >_  vi ,  where  each  vi  is  originaUy  locat^  on  a  different  PE.  The  goal  is  to  minimize 
parallel  communication  time.  We  have  written  a  TR  that  solves  the  problem  within  2  steps  of 
optimality  regardless  of  the  initial  placement  of  the  operands  in  the  grid. 

This  sort  of  study  opens  an  area  of  research,  delimited  by  choosing  different  combinations  of 
assumptions  about  PE  behavior,  network  topology,  and  problems  to  be  solved.  A  systematic 
study  of  these  questions,  oriented  toward  problems  which  are  communication-intensive  and  of 
interest  to  people  developing  packages  like  LINPAK,  seems  in  order. 


(3)  Peter  Mills  (Research  Associate)  with  John  Reif:  Abstractions  and 
Transformational  Implementation  for  Loosely  Synchronous  and 
Asynchronous  Parallel  Algorithms 

Recent  research  has  focused  on  extension  of  high-level  parallel  computation  models  with 
abstractions  for  asynchronous  tagged  memory  commiuiication  and  (hstribution  of  computational 
resources;  development  of  intermediate  representations  for  these  abstractions;  and 
transformation  techniques  to  realize  these  abstractions  on  practical  parallel  machines.  We  have 
developed  a  new  parallel  programming  construct,  the  rate  construct,  which  constrains  relative 
rates  of  progress  of  parallel  processes,  and  specified  its  semantics  as  well  as  evidenced  its  utility 
through  application  to  sever^  algorithms.  The  rate  construct  attempts  to  fiU  a  gap  in  the 
expression  of  resource  requirements  such  as  computatitxial  progress  which  real-time  constructs 
do  not  adequately  address.  The  succinct  expression  of  such  requirements  is  needed  to  explore 
parallel  algorithms  at  a  suitable  level  of  abstraction.  These  and  other  wide-spectrum  constmcts 
we  are  developing  provide  a  rich  vehicle  for  expressing  parallel  algorithms  and  serve  as  a 
concrete  carrier  for  refinement  techniques,  fcHming  a  key  component  in  practical  implementation 
of  the  parallel  algorithms. 

This  is  a  top-down  approach;  we  are  incorporating  these  abstractions  into  our  architecture- 
independent  parallel  language  Proteus  and  developing  reEnement  techniques  targeting  such 
lower-level  languages  such  as  the  C  Vector  Library  (CMU).  At  the  same  time  we  are 
investigating  extending  an  existing  widely  portable  data-parallel  language,  CMUs  NESL 
(supporting  nested  data  parallelism)  with  a  wrapper  for  asynchronous  paradlelism  built  on 
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shared  state  and  lesult-parallelism  (MultiUsp  futures).  The  intent  is  to  extend  and  thus 
capitalizing  on  their  existing  techniques  for  refining  nested  data  parallelism  to  vector  models. 

Recent  Accomplishments: 

(1)  Rate  Contrc^  as  a  Resource  Distribution  Abstracdcxi: 

The  rate  construct  specifies  ccmstraints  on  the  relative  rates  of  progress  of  tasks  executing  in 
parallel,  where  progress  is  the  amount  of  computational  work  as  measured  by  elapsed  ticks  on  a 
local  virtual  clock.  By  prescribing  expected  work,  rates  abstractly  specify  the  allocatirm  at 
processors  to  tasks  needed  to  achieve  that  work,  effected  for  example  by  load  balancing.  We 
have  developed  a  simple  notion,  fixed-rate,  which  apportions  an  unvarying  percentage  of 
processor  time  to  each  pocess  with  no  assumptions  of  global  clock  synchronization  between 
processors.  The  notion  is  generalized  to  encompass  distributed  systems  and  time-varying 
processor  workloads  by  measuring  computational  work  in  terms  of  local  virtual  cloc^  updated 
in  user-specified  increments  by  process  events. 

The  utility  of  the  rate  construct  has  been  evidenced  for  a  variety  of  problems,  including 
wei^ted  parallel  search  fw  a  goal,  adaptive  many-body  simulation  in  which  rates  abstract  die 
requirements  fix'  load-balancing,  and  variable  time-stepped  computations  in  which  the  use  of 
fictitious  rates  can  alter  the  frequency  of  asynchronous  iterations.  We  are  currently  investigating 
means  of  transforming  rate  primitives  to  lowCT-level  real-time  and  scheduling  constructs. 

(2)  Abstractions  for  Tagged-memory; 

We  are  integrating  the  tagged-memory  model  into  the  parallel  language,  Proteus,  to  serve  as  a 
vehicle  for  our  refinement  techniques.  We  have  extended  our  set-tfieoretic  parallel  language 
Proteus  with  synchronization  variables.  Synchronization  variables,  common  to  coordination 
languages  such  as  PCN,  are  a  synchrcxiization  mechanism  in  which  processes  must  wait  for  an 
referenced  uninitialized  variable  to  become  defined.  We  extend  this  model  with  othar  shared- 
memory  abstractions  for  specifying  process  topology  and  duecting  non-local  references  to 
processes.  Synchronization  variables,  in  combination  with  other  Proteus  features  such  as 
barriers  for  expressing  loosely  synchronous  computations  (SPMD  and  SIMD,  prove 
particularly  advantageous  in  that  they  can  be  used  to  map  dii^tly  to  tagged-memcxy  machines 
such  as  the  J-machine,  or  save  as  a  foundation  for  implementing  primitives  on  otha  machines. 

Ongoing  work: 

Development  of  an  MSIMD  language  extending  CMlTs  nested  data-parallel  language  NESL 
with  a  wrapper  for  asynchronous  process  execution  based  on  shared  variables  and  MultiLisp 
futures. 

Development  of  refinement  techniques  for  transforming  extended  NESL  to  threads  of  vector 
code,  targeting  such  machines  as  Qay  YMP  and  CM-5. 

Demonstration  of  viability  of  these  techniques  through  concrete  implementaticxis  of  N-body 
algorithms,  specifically  clustering  and  Fast  Multipole  Methods,  targeting  asynchronous 
collections  of  SIMD  machines. 

Extending  models  of  parallel  computation  with  real-time  properties,  such  as  processor  rates,  in 
order  to  supptnt  timing  analysis.  Current  models,  variants  of  the  PRAM  such  as  the  APRAM 
and  HPRAM,  only  accommodate  asynchrony  of  control  or  hierarchy  of  control  and 
communication. 
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(4)  Peter  Su  (postdoc)  and  John  Reif:  Implementations  of  Parallel  Algorithms  in 
Computational  Geometry 

We  have  been  wcH’ldng  (hi  the  implementadonal  aspects  of  parallel  algoithms.  Specifically,  we 
have  been  studying  pa^el  algorithms  for  constructing  Voronoi  Diagrams  and  related  prc^lems. 
Our  interest  in  this  study  is  not  only  to  build  effective  algorithms  for  these  problems,  but  also  to 
ctmsider  the  kinds  of  tools  that  make  such  work  easier  and  iix>re  effective. 

Our  wwk  has  been  brc^en  up  into  three  stages: 

(1)  Study  the  theory  and  practice  of  conventional  algorithms  fw  this  problem. 

(2)  Study  the  current  body  of  theoretical  work  on  p^allel  algorithms  fw  this  problem. 

(3)  Using  the  knowledge  gained  in  (a)  and  (b),  design  and  implement  parallel  algorithms  for 
this  problem  on  several  machines.  Then  study  the  performance  of  these  algorithm  and  how  well 
the  theoretical  results  match  the  behavior  of  the  in^lementation. 

We  have  been  actively  working  on  stages  (1)  and  (2)  for  the  last  few  months  and  we  are  now 
ready  to  move  on  to  stage  (3).  The  study  o(  practical  sequential  algwithms  has  been  especially 
helpful  in  the  pursuit  of  simple  and  efficient  parallel  algcsrithms,  since  they  provide  a  good  set  of 
ideas  to  extend  and  refine  in  a  parallel  setting. 

Using  the  experience  that  we  gain  from  this  work,  we  are  also  investigating  and  planning  tools 
that  could  aid  the  programmer  in  implementing  effective  parallel  algorithms.  Since  many  parallel 
algorithms,  especially  in  computational  geometry,  have  similar  structure,  one  could  imagine  a 
tool  for  reasoning  about  abstract  classes  of  algorithms.  In  particular,  such  a  system  could  aid  the 
programmer  in  tuning  performance  parameters  for  specific  machines  based  on  architectural 
characteristics  such  as  global  memory  bandwidth  and  latency,  processor  speed,  local  menwry 
size,  and  so  on.  Also,  more  basic  tools  for  doing  visualization  and  performance  analysis  are 
needed  to  help  the  programmer  to  effective  experimental  analysis  of  his  implemoitations.  Tools 
for  profiling,  animation,  simulation  and  data  analysis  would  ^  be  extremely  useful  in  these 
settings.  At  this  point,  there  are  no  such  tools  widely  available  to  the  research  community. 

Initial  implementations  of  the  ideas  in  this  work  has  begun  and  has  been  successful.  I  presented 
a  paper  at  the  DAGS  conference  this  summer  that  describes  Cray  algorithms  for  basic  proximity 
problems.  I  have  also  begun  to  explore  implementations  of  the  other  ideas  on  various 
machines,  including  the  MasPar  MP-1,  the  CM-5,  and  the  KSR-1.  This  develq)ment  wOTk  will 
make  up  a  large  part  of  my  PhD  thesis,  which  should  be  fiitished  by  this  spring. 

(5)  Shenfeng  Chen  with  John  Reif:  Parallel  Sort  Implementation 

The  fastest  known  sort  is  a  parallel  implementati<»  of  radix  sort  in  a  CRAY,  due  to  Chiu's  Guy 
Blelloch.  The  current  sorting  algorithim  on  parallel  machines  like  Cray  and  CM-2  use  radix  and 
bucket  sort  But  they  are  not  talang  advantage  of  possible  distributitxi  of  the  input  keys.  We  are 
developing  an  algorithm  using  data  compression  to  achieve  a  fast  parallel  algorithm  which  takes 
this  advantage.  We  expect  the  new  algorithm  to  beat  the  previous  fastest  sort  by  a  few  factors. 

We  are  wor^g  to  implement  this  new  parallel  sorting  algorithm  on  various  pa^el  machines. 

DeMsi 

Radix  sort  is  very  efficient  when  the  input  keys  can  be  viewed  as  bits.  But  the  basic  radix  sort  is 
not  distribution  based  so  it  needs  to  look  up  all  digits. 
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Our  approach  is  to  find  the  structure  (distribution)  of  die  input  This  is  achieved  by  sampling 
fixxn  the  original  set  Then  a  hash  table  is  build  from  those  sample  keys.  All  keys  are  index^  to 
buckets  separated  by  consecutive  sample  keys.  A  probability  analysis  shows  that  the  largest  set 
can  be  bounded  within  a  constant  of  the  average  size. 

The  indexing  step  is  made  faster  by  binary  searching  the  hash  table  for  match.  From  previous 
result  each  hash  function  computation  ne^s  only  constant  time. 

Our  algcxithm  needs  0(nloglogn)  time  in  sequential  given  that  the  con^ression  ratio  of  the  given 
input  set  is  not  too  big.  In  parallel,  our  algorithm  works  well  in  chain-sorting.  In  list  ranking 
sorting,  the  total  work  is  also  reduced. 

We  have  inqilemented  this  algorithm  on  Sparc  II  and  compared  its  performance  with  the  system 
routine  quicksort  It  turns  out  that  our  algorithm  outwins  die  quick^rtO  for  sufficiendy  large 
number  of  keys  (32M).  Thus,  it  may  find  its  place  in  sorting  large  database  operations  (e.g., 
required  by  joint  operations).  In  these  applicatitxis  the  keys  are  many  words  long  so  our 
algorithm  is  even  more  advantageous  in  this  case  where  the  cutoff  is  much  lower. 

(6)  Deganit  Armon  (A.B.D.)  with  John  Reif:  Parallel  Implementation  of  Nested  Dissection 

Summary: 

We  worked  on  parallel  implementaticm  of  nested  dissection,  a  numerical  method  for  solving 
large  sparse  systems  of  linear  equations,  showing  three  improvements  to  the  known  algorithms 
and  implementations.  These  include  a  r^uction  in  the  memory  requirements  of  the  algorithm,  a 
widening  of  the  class  of  problems  solvable  with  parallel  nested  dissection  (PND)  on  a  mesh- 
connect^  processor  array,  and  a  reduction  in  the  asymptotic  time  bounds  of  PND. 

Details: 

Nested  dissection  is  a  method  for  solving  sparse  linear  systems  of  equations  by  exploiting  the 
graph  structure  underlying  the  input  matrix. 

One  of  the  problems  with  the  known  implementations  of  PND  is  the  large  storage  requirements 
of  the  algorithm;  these  limit  the  size  of  problem  for  which  PND  is  useful.  We  show  that  it  is 
possible  to  significantly  reduce  the  storage  used  by  an  implementation  on  a  processor  array  to  a 
constant  factor  of  the  size  of  the  input  matrix.  This  improvement  can  be  added  without  affecting 
the  time  bounds  of  the  algorithm.  We  are  currently  working  on  an  actual  implementation. 

Using  load  balancing  techniques,  we  show  that  PND  can  be  used  to  solve  a  larger  class  of 
problems  on  a  mesh-connect  processor  array.  In  particular,  we  can  use  PND  to  solve  any 
system  of  equations  whose  underlying  graph  is  of  bounded  degree. 

We  improve  on  the  results  of  Pan  and  Reif,  who  showed  that  PND  can  be  inqilemented  in 
0(log3  n)  time.  By  taking  into  account  the  fact  that  processors  are  idle  during  later  stages  of  the 
algorithm,  several  stages  can  be  grouped  and  perfortd  together  to  achieve  an  C)0og2  n)  time 
algorithm  for  hypercubes. 

An  area  of  ongoing  investigation  is  the  implementation  of  PND  on  various  parallel  models.  One 
possible  such  model  that  we  are  looking  at  is  the  hierarchical  nxxlel  of  parallel  memory 
proposed  by  Heywood,  which  may  be  closer  to  existing  parallel  architecture. 


(7)  Prokash  Sinha  with  John  Reif:  Randomized  Parallel  Algorithms  for  Min  Cost  Paths 
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Summary: 

We  have  completed  our  initial  investigation  to  derive  randcMnized  parallel  algorithms  for  Min 
Cost  Paths  in  a  Graph  of  High  Diameter.  Our  present  acctxnplishment  is  a  randomized 
sequential  algorithm  with  an  order  of  magnitude  performance  gain  fca*  some  dense  grt^hs. 

We  also  found  a  similar  result  fo*  PRAM  computational  model  which  meets  the  wOTk  we  pr(qx>sed 
to  do  in  our  paper  "A  Randomized  Algorithm  fcx*  Min  Cost  Paths  in  a  Graph  of  High  Diameter 
Extended  Abstract"  (  J.  Reif  and  P.  Sinha).  Currently  we  are  in  the  process  of  submitting  our 
findings  to  technical  journals  aiKl  conferences.  Our  next  phase  of  work  would  include  similar 
doivations  of  randomized  parallel  algcaithms  fcr  a  wide  variety  of  discrete  structures  which  arises 
naturally  in  the  area  of  Graph  Theory  and  Combinatorics.  Our  current  research  effort  is  to  extend 
the  techniques  of  Rajolet  aiui  Karp  to  Develop  techniques  and  tools  for  timing  analysis  of 
algorithms.  This  effort  is  to  derive  tools  for  semiautomatic  randomized  analysis. 

(8)  Hongyan  Wang  with  John  Reif:  Social  Potential  Fields:  A  Molecular 
Dynamics  Approach  for  Distributed  Control  of  Multiple  Robots. 

Summary: 

Much  of  the  early  research  in  robotic  planning  and  control  has  considered  the  case  of  only  a 
single  robot  There  is  now  a  number  of  robot  systems  which  include  a  small  number  of 
autonomous  robots  and  consequently  there  is  a  quickly  growing  literature  cm  the  planning  and 
cooperative  control  of  systems  of  small  numbers  of  robots.  Our  woric  is  concern^  with  Very 
Large  Scale  Robotic  (VLSR)  systems  consisting  of  at  least  hundreds  to  perhaps  tens  of 
thousands  or  more  autonomous  robots.  Our  molecular  dynamics  approach  is  dstributed  and 
robust  and  flexible. 

E)etails: 

We  view  our  VLSR  systems  as  a  molecular  dynamics  system,  with  predefined  force  laws 
between  each  ordered  pair  of  components  (robots,  obstacles,  objectives  and  other 
configurations).  These  force  laws  are  similar  to  those  found  in  molecular  dynamics, 
incorporating  both  attraction  and  repulsion  in  the  form  of  inverse  power  laws.  However  these 
laws  may  differ  from  molecular  systems  in  that  we  allow  the  controller  to  arbitrarily  define 
distinct  laws  of  attraction  and  repulsion  for  separate  pairs  and  groups  of  robots  to  reflect  their 
social  relations  or  to  achieve  some  goals.  For  example,  we  define  a  pair-wise  force  law  of 
attraction  and  repulsion  for  a  group  of  identical  robots.  The  repulsion  will  prevent  collision 
among  robots  and  the  attraction  will  keep  them  in  a  cluster.  Tim  simulates  the  phenomena  called 
"individual  distance"  in  sociobiology. 

Once  the  force  laws  are  set  up  (they  can  be  modified  by  the  global  controller),  each  individual's 
movement  is  computed  locally  according  to  the  local  environment  sensed  by  individual  robots 
and  the  force  laws.  Thus  the  control  is  distributed  and  robust  Each  robots  obeys  Newton's 
Law  and  makes  movement  complying  to  the  total  frxce  on  it  from  the  other  con^>onents. 

We  pve  concrete  examples  to  show  that  this  distributed  autononx>us  control  will  have  lots  of 
applications  in  industry,  military  and  other  areas  in  the  future  when  costs  for  individual  robots 
di^  and  robots  can  be  made  much  m<xe  compact  and  more  capable  and  flexible. 

We  did  computer  simulations  involving  large  numbers  of  robots.  Some  interesting  and  useful 
patterns  can  be  achieved  by  defining  proper  force  laws  for  the  system,  e.g.  forming  a  nwre  or 
less  evenly  distributed  sin^e  cluster,  frx-ming  a  circle  to  guard  a  static  point  particle  standing  for 
castle.  We  are  doing  more  simulations  showing  more  complex  patterns. 
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We  also  discuss  about  spring  laws  similar  to  molecular  bondings  to  robotic  control.  Thecxies  of 
graph  ri^dity  support  that  we  can  design  a  VLSR  sj^tem  which  has  a  rigid  structure.  This  has 
also  applications  whoe  assemblies  are  needed  to  fiidsh  some  j6b  efficiently. 

(9)  Hongyan  Wang  with  John  Reif:  A  Constant  Time  Algorithm  for  N-body  Simulation 
with  Smooth  Distributions. 

Summary; 

N-body  simulation  problem  is  as  follows:  Given  N  points  that  have  pair-wise  interactions, 
ctxnpute  the  equilibrium  configuration  of  the  N  points.  This  problem  is  central  to  a  large  body 
of  work  in  theoretical  physics,  chemistry,  and  scientific  computing,  including:  cosmology, 
plasina  simulatimi,  molecular  dynamics,  and  fluid  mechanics.  The  fastest  N-body  simulation 
algorithm  due  to  Greengard  has  time  complexity  of  0(N)  for  one  step  simulation.  We  pit^sc 
to  use  the  concept  of  density  function  to  describe  the  amfiguratkai  of  the  large  particle  system 
and  a  method  to  compute  the  equilibrium  density  function  itmtively  when  given  the  initial 
density  function  in  constant  time  with  the  time  complexity  dq>eoding  only  on  the  potential 
function  and  the  required  precision. 

Details: 

In  a  system  of  large  number,  say  millions  of  particles,  we  are  interested  noore  in  the  structure  of 
the  system,  especially  the  structure  under  equilibrium  conditions  than  in  the  exact  positions  of 
all  particles.  Observations  from  many  fiel^,  such  as  cosmology,  plasma  simulations, 
molecular  dynamics,  and  fluid  mechanics,  suggest  that  the  distributi(»  of  particles  is 
homogeneous  and  can  be  described  by  smoodi  functions.  Thus  we  propose  to  use  density 
function  to  describe  the  configuration  of  particle  systems. 

Based  on  the  fact  that  under  equilibrium  conditions,  the  total  force  on  each  particle  should  equal 
to  0,  we  derive  an  iterative  procedure  IMPROVE  for  improving  the  density  function,  which  is 
of  the  fonn'phi^(n+l)(x)=IMPROVE(Nphi'Hn)(x)),  wh^  x  is  a  position  in  the  domain  of 
interest.  To  compute  the  total  force  (xi  one  robot  by  summing  up  (tiscretely  all  the  forces  fiiom 
other  robots  will  require  Omega(n)  time.  Instead,  we  only  sum  up  fwces  fixmi  a  constant 
number  of  nearby  particles.  For  particles  far  away,  we  do  an  integratitxi  of  farce  function 
multiplied  by  density  function  to  approximate  the  resultant  force.  Thus  reduce  the  time 
complexity  to  constant  Thus  each  in^rovement  procedure  requires  constant  time  and  the 
number  of  iterations  depends  on  the  required  precision  thus  can  be  constant 

Simulations  showed  that  the  iterative  improvement  procedures  converges.  The  results  showed 
that  in  1-d  the  density  function  has  a  bell-shaped  curve  aiKl  in  2-d  has  a  vault-shaped  surface  in 
the  domain  of  interest  and  outside  the  domain  has  0  value. 

(10)  Akitoshi  Yoshida  with  John  Reif:  Image  and  Video  Compression 

We  considered  several  compression  techniques  using  optical  systems.  Optics  can  offer  an 
alternative  approach  to  overcome  the  limitations  of  current  compression  schemes.  We  gave  a 
simple  optical  system  for  the  cosine  transform.  We  designed  a  new  optical  vector  quantizer 
system  using  holographic  associative  matching  and  discussed  the  issues  concerning  the  system. 

Optical  computing  has  recently  beconae  a  very  active  research  field.  The  advantage  of  cities  is 
its  capability  of  providing  highly  parallel  citations  in  a  three  dimensional  space.  Image 
compression  suffers  from  large  computational  requirements.  We  propose  (^tical  architectures  to 
execute  various  image  compression  techniques,  utili^g  the  inherent  massive  parallelism  of 
optics. 
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In  our  paper[RY2],  we  (^tically  in^lemented  the  following  compression  and  coire^Kxiding 
decompressicm  techniques: 
o  transform  coding 
o  vector  quantization 
o  interframe  coding  for  video 

We  showed  many  generally  used  transform  coding  methods,  for  example,  the  cosine  transform, 
can  be  implemented  by  a  single  optical  system.  transform  coding  can  be  carried  out  in 
constant  time. 

Most  of  this  paper  is  concerned  with  an  innovative  optical  system  for  vector  quantization  using 
holographic  associative  matching.  Limitatitms  of  convraticHial  vector  quantization  schemes  are 
caus^  by  a  large  number  of  sequential  searches  through  a  large  vecttM*  space.  Holographic 
associative  matching  provided  by  multiple  e;q>osure  holograms  can  offer  advantageous 
techniques  for  vector  quantization  based  compression  schemes.  I%oto-refractive  crystals,  which 
provide  hi^  density  recording  in  real  time,  are  used  as  our  holographic  media.  The 
reconstructicxi  alphabet  can  be  dynamically  constructed  through  training  or  stored  in  the 
photorefractive  crystal  in  advance.  Encoding  a  new  vector  can  be  carri^  out  by  holographic 
associative  matching  in  constant  time. 

We  also  discussed  an  extension  of  this  optical  system  to  interframe  coding. 

On  going  work: 

We  are  investigating  optical  algrxithms  for  video  con^ression. 

(1)  Computational  Geometry  by  Optical  Computers 

Some  problems  require  inherently  high  degrees  of  intercrmnections  which  may  not  be  provided 
by  any  ^venticmd  electrical  computers.  The  advantage  of  optical  computers  is  their  apparent 
parallelism  in  a  three  dimensional  ^ace.  Several  con^utational  models  have  been  prc^x^ed  and 
coiutructed.  As  the  process  of  optical  ccxnputers  continues,  there  is  a  great  demai^  in 
designing  and  investigating  various  algorithms  that  are  efficient  and  appropriate  for  the 
proposed  models.  This  situation  resembles  to  the  cme  a  decade  ago,  when  various  algorithms 
were  investigated  for  the  theoretical  VLSI  model.  Thus,  we  understand  that  the  investigatitxi  (mi 
optical  ccHnputing  algorithms  will  be  essential  to  the  develt^ment  of  optical  or  hybrid  massively 
parallel  con^uters. 

Optical  teclmiques  are  particularly  suited  ftH-  processing  images.  This  leads  us  to  believe  that 
many  problons  found  in  conq)utational  gecxnetry  may  be  efficiently  solved  by  q>tical 
computers.  Some  researchers  have  recently  started  to  investigate  some  basic  problems.  We  have 
been  investigating  these  and  some  other  problems.  We  have  obtained  some  new  results. 

(2)  Optical  Interconnection 

A^ng  processing  units  placed  on  a  plane,  various  space-invariant  interconnectirxis  can  be 
holographically  established  in  constant  titrjc.  We  are  investigating  appropriate  interconnections 
and  efficient  algorithms  for  several  problems. 

(3)  Efficient  computation  for  (^tical  scattering 

An  efficient  algorithm  to  solve  the  Helmholtz  equations  was  develc^)ed  by  Rokhlin  at  Yale.  We 
have  been  studying  his  algtxithra 

Other  areas  of  research 

Area  of  interest  display: 

E>esigning  optical  methods  for  eye  tracking. 

Simdation  of  Hologr{q)hic  elements 
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